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Abstract — This paper addresses the problem of low-rank 
distance matrix completion. This problem amounts to recover 
the missing entries of a distance matrix when the dimension of 
the data embedding space is possibly unknown but small com- 
pared to the number of considered data points. The focus is on 
high-dimensional problems. We recast the considered problem 
into an optimization problem over the set of low-rank positive 
semidefinite matrices and propose two efficient algorithms for 
low-rank distance matrix completion. In addition, we propose 
a strategy to determine the dimension of the embedding space. 
The resulting algorithms scale to high-dimensional problems 
and monotonically converge to a global solution of the problem. 
Finally, numerical experiments illustrate the good performance 
of the proposed algorithms on benchmarks. 

This is the pre-print version of [1]. 

I. INTRODUCTION 

Completing the missing entries of a matrix under low-rank 
constraint is a fundamental and recurrent problem in many 
modem engineering applications (see [2] and references 
therein). Recently, the problem has gained much popularity 
thanks to collaborative filtering applications and the Netflix 
challenge [3]. 

This paper focuses on an important variant of the prob- 
lem, that is, completing the missing entries of a Euclidean 
distance matrix (EDM) under low-rank constraint. Typical 
applications include data visualization [4], dimensionality re- 
duction in behavioral sciences and economics [5], molecular 
conformation problems [6], [7], just to name a few. 



A Euclidean distance matrix D G 



contains the 



(squared) pairwise distances between n data points y^ £ W , 
i = l,...,n. This matrix is symmetric and has a zero 
diagonal. Its entries are non-negative and satisfy the triangle 
inequality. These properties are readily verified by examining 
the entries of the distance matrix. 
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The set EDM(n) of n-by-n Euclidean distance matrices 
forms a convex cone which has a well-studied geometry (see 
[8], [9], and references therein). One property of a Euclidean 
distance matrix is that it is rank deficient. The rank of D is 
upper bounded by r + 2 (and the rank is generically r + 2), 
which in many problems is very small compared to n, the 
number of data points. 

Given a set of pairwise distances or dissimilarities between 
data points, the goal of low-rank distance matrix completion 
algorithms is to recover a full Euclidean distance matrix from 
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a restrictive set of given distances. Inference on the unknown 
entries is possible thanks to the low-rank property which 
models the redundancy between the available data. 

A closely related problem is multidimensional scaling 
(MDS) for which all pairwise distances are available up 
front. A solution to this problem is the classical multidimen- 
sional scaling algorithm (CMDS), which relies on singular 
value decomposition to find a globally optimum embedding 
of fixed-rank. The CMDS algorithm minimizes the total 
quadratic error on scalar products between data points. Other 
algorithms have focused on variant cost functions, see the 
paper [10] for a survey in this area. 

In contrast to the classical multidimensional scaling for- 
mulation, the problem of Euclidean distance matrix com- 
pletion involves missing distances. The problem can be 
considered as a variant of multidimensional scaling problem 
with binary weights [10], [11]. The low-rank distance matrix 
completion problem is known to be NP-hard in general [12], 
[13], but convex relaxations have been proposed to render 
the problem tractable [14], [15]. Typical convex relaxations 
cast the EDM completion problem into a convex optimization 
problem on the set of positive semidefinite matrix, resulting 
in semidefinite programming techniques [16]. This convex 
formulation is nevertheless a large-scale problem when n is 
large. 

Imposing the rank constraint in the problem formulation 
is an appealing way of reducing the size of the search space. 
However, it results in a non-convex optimization problem. 
Although convergence results are only local, the approach 
performs well in practice [17]. Both first-oder [18], [19] and 
second order [7], [20], [11], [21] optimization methods have 
been considered and heuristics for finding a good low-rank 
initialization have been proposed [21]. 

A difficulty encountered by second order optimization 
algorithms is the intrinsic invariance properties of the data 
representation due to rotations. This issue may indeed pre- 
vent second order optimization algorithms to converge [22] . 
Several authors have resolved this issue at the extra cost 
of normalizing the data representation [11] or adding a 
penalization term to the objective function [20]. In this paper, 
the invariance to rotations is lifted in the problem formulation 
and is free of additional computational cost (see Section Ullb. 
A survey of low-rank distance matrix completion algorithms 
can be found in the recent papers [23], [21]. 

Although, the problem is not new and is well-studied, a 
practical limitation of most of existing algorithms is that they 
do not scale to high-dimensional problems. Moreover, the 
problem of choosing a priori an appropriate dimension for 
the data embedding is still an open research question. 



In this paper, the focus is on efficient algorithms that 
scale to high-dimensional problems. Following a number 
of previous contributions in the literature, we recast the 
problem into an optimization problem over the set of low- 
rank positive semidefinite matrices. We adopt the geomet- 
ric optimization framework of optimization on Riemannian 
matrix manifolds [24]. Our main contribution is to extend 
the framework developed in [25] to the problem of low- 
rank distance matrix completion. This results in an efficient 
strategy for estimating the dimension of the embedding 
space. The proposed algorithms have linear complexity in 
the problem size and in the number of available distances. 
The strategy for estimating the optimal embedding dimension 
ensures that the proposed algorithms converge monotonically 
to the global (low-rank) solution of the problem. 

The paper is organized as follow. Section |II] presents the 
problem of interest and its different formulations. Section Hill 
describes the chosen optimization framework and introduces 
the main geometrical objects required by our algorithms. 
Section |IV] is devoted to the design of efficient algorithms 
for low -rank distance matrix completion. Finally, Section [Vl] 
presents some numerical simulations. 

II. LOW-RANK DISTANCE MATRIX COMPLETION 

Given a set of dissimilarities Dy > between n data 
points, distance matrix completion algorithms solve 



min ||H0(D-D)||^, 

DGEDM(n) 



(1) 



where H is a symmetric matrix with binary entries and the 
operator denotes elementwise multiplication. If V is the 
set of given entries {i,j) in D such that i < j, then 



H, 



H 



j'l 



1 if(^,.7)eP, 
otherwise. 



The number of elements in the set V is denoted by d. 
Although, d is at most equal to n{n — l)/2, in most 
applications, it is of order 0{jir), where r is the optimal 
embedding dimension. Dissimilarities potentially differ from 
distances in that they are not required to satisfy triangle 
inequality. For instance, this takes into account the fact that 
observation noise could make D different from a valid EDM. 
A convenient alternative formulation of ^ is to cast 
this problem into an optimization problem on the set of 
positive semidefinite matrices [14]. The reformulation hinges 
on a classical result by Schoenberg which relates Euclidean 
distance matrices and positive semidefinite matrices of rank 
equal to the dimension of the embedding space [26]. The 
corresponding reformulation can be written as 



nun ||H0(«(X)-D)|||, 



(2) 



where k is a mapping from the set of positive semidefinite 
matrices to the set of Euclidean distance matrices 

k(X) = Diag(X)l^ + IDiag(X)^ - 2X. 

The function Diag(-) extracts the diagonal of its argument, 
and 1 denotes a vector with all entries equal to one. 



A practical advantage of ^ compared to ([T} is that the 
rank of X identifies with the dimension of the embedding 
space. When no restriction is imposed on the rank of X, 
problem (|2]i is convex and thus presents a global solution. 

In this paper, we consider the case where the global 
solution X* of (|2]i is low-rank that is. 



rank(X*) = r < n. 



(3) 



Following [25], we solve a sequence of nonconvex problems 
of increasing dimension until the actual value of the rank r 
is reached. Each nonconvex problem consists in solving the 
following rank-constrained optimization problem 

min ||H0(k(X) -D)|||, s.t. rank(X) = p. (4) 

By screening values from p = 1 to p ^ r, the results 
presented in [25] guarantee a monotonic convergence to a 
solution of the original problem ([2). The proposed strategy 
for finding the actual rank r is detailed in Section IIV-CI 

Problem (|4]i is solved efficiently by exploiting a low-rank 
parametrization of the search space. The proposed approach 
hinges on the fact that any rank-p positive semidefinite matrix 
admits a factorization 



X = YY' 



where Y G R"''^ = {Y e M"^p : dct(Y'^Y) 7^ 0}. 

To exploit this factorization, we adopt the geometric 
framework of optimization on Riemannian manifolds [24]. 
Basic concepts and notations are introduced in the next 
section. See the book [24] for further details on optimization 
on matrix manifolds and for a state-of-the-art in this area. 

III. MANIFOLD-BASED OPTIMIZATION 

An intrinsic property of the factorization X = YY^ is 
that it is invariant with respect to the transformation 

Yh^YQ, 

where Q G 0{p) = {Q G Wp : Q'^Q = QQ"^ = I}. 

This invariance property renders the minima of a cost 
function /(YY^) not isolated. This issue is not harmful for 
first order-methods such as gradient descent algorithms but 
greatly affects the convergence properties of second-order 
methods [24], [22]. 

To circumvent this issue, we reformulate the problem of 
interest as an optimization problem on the quotient manifold 

M^S+{p,n)c^Kr''/Oip), (5) 

which represents the set of equivalence classes 

[Y] = {YQ : Q G 0{p)}. (6) 

The set S+ {p, n) is the set of rank-p symmetric positive 
semidefinite matrices of size n, that is. 



S+{p,n) = {X G 



X = X^ ^ 0, rank(X) =p}. 



This set has a rich Riemannian manifold geometry which 
can be exploited for algorithmic purposes [27], [28], [29]. 



Problem (|4]i is now reformulated as an unconstrained 
optimization problem over the set of equivalence classes ^, 



[Y]eM 



(7) 



for the cost function 

/([Y]) = ||H0(«.(YY^)-D)|||. (8) 

To develop optimization algorithms on the quotient mani- 
fold, the tangent space TyM of ^ is endowed with the 
Riemannian metric 

5y(Cy,?7y) = TiiS.YiTY), '?Y,'7Y e T^M, 

which is inherited from the natural metric of M"^p. With 
this metric, the tangent space TyM at a given point Y is 
decomposed into the sum of two complementary spaces, 

TyM = VyM®HyM. 

The vertical space Vy-M contains the set of directions that 
are tangent to the set of equivalence classes (|6]l, that is, 

VyM = {Yrj : n^ = -n e RP^'P}. 

The horizontal space Hy-M contains the directions ^y that 
are orthogonal to the set of equivalence classes, 

HyM = {Cy e K"^P : e'vY = Y'^^y}- 

With such a construction, the directions of interest can be 
restricted to horizontal directions £,y- Indeed, displacements 
along vertical directions leave the cost function unchanged. 
The projection of a direction Z G IR"^^ onto the horizontal 
space is given by Uu^iZ) = Z - YJ7, where n e KP^'p is 
skew-symmetric and satisfies the Sylvester equation 

nY^Y + Y^Yfl = Y^Z - Z^Y. 

Overall, projecting a direction Z e M"^p onto the horizontal 
space requires 0{np^ + np + p^) operations (computing 
matrices Y-'^Y, Y-'^Z, and Yil requires 0{np^) operations, 
solving the Sylvester equation is performed in 0{p^) opera- 
tions and the projection requires 0{np) operations). 

To update our search variable, we require a local mapping 
from tangent space to the manifold. Such a mapping is called 
a retraction. For the manifold of interest, a retraction is 
provided by the simple and efficient formula 

RY(CY) = Y + eY. (9) 

which gives a full-rank matrix for generic direction fy- 

IV. ALGORITHMS 

In this section, we exploit the concepts presented in the 
previous section to develop both a gradient descent algorithm 
and a trust-region algorithm to solve d?). 



A. Gradient descent algorithm 

The gradient of a smooth cost function / : A^ ^- M is the 
unique tangent vector grad/(Y) G TyM that satisfies 

.gY(?Y,grad/(Y)) = i?/(Y)[eY], V^y e TVX. (10) 

The quantity Df {Y)[£^y] is the directional derivative of / 
in the direction ^y, that is. 



A/(Y)KY] = lm 



f{Y + t^Y)-f{Y) 



Applying formula ( fTol i to the cost (HJ gives us the gradient 
grad/(Y) =2k*(H0(k(YY^)-D))Y, (11) 
where k*(A) is the adjoint operator of k defined by 

K*(A) = 2(Diag(Al) -A). 

Combining the gradient ( fTTT i with the retraction (|9} gives us 
the gradient descent algorithm 



Yt+i =Yt- 2.stK*{Ii (niYtYj) - D))Yt 



(12) 



where sj > is the gradient step size. We select st using 
the Armijo criterion [30], that is, a step size sa that satisfies 

f(Yt - SAgmdfiYtj) < f{Yt) - C5^||grad/(Y,)|||, 

where c G (0, 1) is a constant (we choose the value c = 0.5). 
The asymptotic computational cost of an iteration ( fT2l i is 
0{dp + np), where d is the number of known entries of D. 
The memory requirement is 0{d + np). The computationally 
most demanding step is the computation of the gradient, 
which requires 0{dp) operations. This low computational 
complexity and memory requirement allows us to handle 
potentially large data sets. A drawback is however that the 
gradient descent algorithm only guarantees a linear conver- 
gence rate. We can achieve a superlinear convergence rate by 
means of a Riemannian trust-region algorithm which exploits 
second-order information. 

B. Trust-region algorithm 

Trust-region algorithms sequentially solve the problem 

_ mill /(Y) +.gY(f,grad/(Y)) + igY(C" Hess/(Y)[C]), 
^eU^M 2 

s.t. gY{lO<5\ 

which amounts to minimize a quadratic model of the cost 
function on a trust-region radius of size 5. Once a search 
direction ^ is identified, the search variable is updated as 



Y 



t+i 



Ry.(0. 



(13) 



The trust-region radius 6 vary according to the quality of 
the iterate. When a good solution is found within the trust- 
region, then the trust-region is expanded. Conversely, if the 
iterate is poor then the region is contracted. 

More technical details on trust-region algorithms on Rie- 
mannian manifolds can be found in [31], [24]. In this paper. 



we adapt the generic implementation of the toolbox GenRTR 
to our problem of interestjj 

Trust-region algorithms require the computation of the 
Riemannian Hessian Hcss/(Y)[7^] in a given direction fj. 
It is obtained as 

Hess/(Y)[77]^V^grad/(Y) 

where Vfjgrad/(Y) is the Riemannian connection of the 
gradient vector field in the direction fj. Riemannian con- 
nections generalize the notion of directional derivative of a 
vector field to Riemannian manifolds. Given a vector field C, 
on Ai that assigns to each point Y £ A^ a tangent vector 
Cy G TyM, the directional derivative of ^ at Y G A4 in a 
direction fj G Hy-M is given by 



v,^ Cy = n«. 



lim 



y+tti 



C^ 



t 



(14) 



Applying this formula to the vector field grad/(Y) gives us 

Hess/(Y)[77] = 2Un.{^*{ii © {<Yfj^ + 77Y^|))Y 
+ n*iHQ{n{YY^)--b))fj). 



The numerical cost of an iteration of the trust-region algo- 
rithm is 0{dp + 7ip + np^ +p'^). The memory requirement is 
0{d + np). The computational bottleneck is the computation 
of the Hessian. Still, the complexity is linear in both the 
number of available distance and in the problem size. With a 
proper parameter tuning, the proposed trust-region algorithm 
enjoys a superlinear convergence rate. 

C. Strategy for estimating the optimal embedding dimension 

The following section is an adaptation of the material 
presented in [25] to the problem of interest. To identify the 
(unknown) rank r of the global solution to (|2]l, we solve a se- 
quence of nonconvex problems (|7]i of increasing dimension. 
The approximation rank p is progressively incremented from 
p = 1 X.O p = r. Using a warm restart strategy for moving 
from one value of p to the next, we are able to propose a 
descent algorithm that converges monotonically to a global 
solution of the original problem (|4]i. 

This strategy efficiently exploits the previous iterations of 
the algorithm as opposed to earlier heuristic methods that 
use random restart for each value of the rank [32]. 

For a given rank p < r, the trust-region or gradient descent 
algorithm gives us a local minimizer Y* of the nonconvex 
problem (|7]i. Let us consider the following initial condition 
for the problem of rank p +1, 

Yo = [y;|o"xi], 

that is, Yp with an additional zero column appended. Since 
Y* is local minimizer for rank p, we have that Yq is a 
critical point for the problem of rank p+1. As Y* is not the 
sought solution to (|2]i, this means that Yq is a saddle point 
for the problem of rank p + 1. Therefore, by virtue of the 

'The software can be downloaded from 

|http: //www .math, f su. edu/-cbaker/GenRTR/| 



second order KKT optimality conditions, there must exists a 
descent direction Z G M"^^ such that 

iTr(Z^i?grad/(Yo)[Z])<0. 

To escape from the saddle point, we can thus exploit the 
following descent direction 

Z = [0"xP|v], 

where v is the eigenvector associated to the smallest alge- 
braic eigenvalue of 



Sy = Vx./(y;y: 



(15) 



and where Vx/(YY^) is the Euclidean gradient of the 
convex cost function /(X) evaluated at YY^. As we have 

grad/(Yo) = Vx./(YoY;f)Yo, 

the proposed direction satisfies 

iTr(Z^i?grad/(Yo)[Z]) = v^Syv < 0. 

The descent direction is exploited by performing a single 
line-search step using the Armijo rule. The resulting iterate 
is then used as the initial condition for the optimization 
algorithm that will solve the problem of rank p+1. 

The procedure stops at the latest when p = n. However, 
in the setting of interest, problem (|2) presents a low-rank 
solution with r ^ n. We thus, expect the algorithm to stop 
much before reaching p = n. 

For the proposed strategy it is important to reach a local 
minimum of the cost function as long as p < r. Although 
in theory convergence to saddle points cannot be excluded 
for gradient descent algorithms, the issue is not harmful 
in practice as saddle point are generally unstable from a 
numerical point of view. 

V. DISCUSSION 

We propose both a gradient descent and a trust-region 
algorithm for solving the fixed-rank Euclidean distance ma- 
trix completion problem. The numerical cost per iteration 
for the gradient descent algorithm is 0{dp + np) versus 
0{dp+np+np^+p^) for the trust region algorithm. Although 
the gradient descent algorithm has a smaller computational 
cost per iteration, the number of iterations required to reach 
convergence is higher than for the trust-region algorithm. 

We thus recommend the trust-region algorithm when a 
high optimization accuracy is required or when the obser- 
vation noise is small. The gradient descent approach should 
be preferred for very large problems where the observation 
noise is high. In this setting, one is usually not interested in 
a solution of high-accuracy, since it generally compromises 
the generalization performance. 

VI. NUMERICAL EXPERIMENTS 

In this section, we evaluate the performance of the pro- 
posed algorithms on benchmarks. A MATLAB implementa- 
tion is available from the first author's webpage|3 

^http : //www .montef lore . ulg . ac .be/~mishra| 



A. A visual example 

This example is adapted from [23]. Consider n = 121 data 
points arranged in a 3-dimensional helix structure defined by 

{x,y,z) = (4cos(3t),4sin(3t),2t), < i < 27r. 

After computing the distance matrix of between these 
points, we randomly remove 85% of the distances uniformly 
and at random to generate a dissimilarity matrix D. From 
15% of distances the goal is to reconstruct the helix structure. 
We run the algorithms with the rank incremental strategy dis - 
cussed in Section ( IIV-CI ). Both algorithms recover correctly 
the helix structure (Figure [T). We only display the results for 
gradient descent as it coincides with the results of the trust- 
region algorithm. The algorithms stop when the relative or 
absolute variation of the cost function drops below 10"''. 
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Fig. 1. The proposed algorithms correctly recover the 3D helix structure 
forin 15% of the complete set of pairwise distances. 



B. Trust-region versus gradient descent 

To compare the two versions of the algorithm, we generate 



a random distance matrix 



D* = k(Y*Y*^), 



(16) 



where Y* G M^oo^^ has entries distributed according to 
gaussian distribution with zero mean and unit standard 
deviation. The fraction of unknown distances is fixed at 
85%. We run the algorithms without knowing the embedding 
dimension. The algorithms are stopped when the relative or 
absolute variation of the cost function drops below 10^^. The 
objective function is plotted against the number of iterations 
(Figures [2(a)| and |2(b)| i. Both algorithms recover the correct 
configuration and dimensionality. The trust-region algorithm 
converges in 15.0 seconds and 193 iterations, whereas the 
gradient descent algorithm converges in 19.6 seconds and 
1565 iterations. Observe the monotonic convergence of both 
algorithms to the sought solution. 

C. Scaling test 

We now evaluate our algorithms on larger random data 
sets. We vary the problem size n from 1000 to 10000. For 
each n, we generate a random distance matrix according to 



(dill with Y* e K"^"*. We sample 0.1 fraction of the total 
amount of distances and the algorithms are run by fixing 
the embedding dimension, p = 4. Results are averaged over 
10 runs. The test has been performed on a single core Intel 
L5420 2.5 GHz with 5GB of RAM. 

The time taken and number of iterations required to reach 
convergence is reported at Figure [3(a)] and [3(b)] respectively. 
For instance, for n ~ 10000 the number of known distances 
is about 5 millions (10% of 50 million total entries). The 
gradient descent algorithm takes about 120 iterations and 31 
minutes, while the trust region algorithm solves the problem 
in 30 iterations and 18 minutes. 

VII. CONCLUSION 

In this paper two efficient numerical optimization algo- 
rithms have been presented for the distance matrix comple- 
tion problem. In particular, the algorithms do not require any 
prior notion about the embedding and can potentially handle 
very large data sets. The proposed algorithms stem from a 
geometric view of the problem formulation. This interpreta- 
tion as a manifold-based optimization problem considerably 
reduced the computational burden. At the same we were 
able to devise a superlinearly converging scheme namely, 
the trust-region algorithm in addition to the linearly conver- 
gent gradient descent algorithm. The numerical experiments 
that have been performed, are very encouraging on various 
parameters. 
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